Skip to content

Conversation

@alexeykudinkin
Copy link
Contributor

Tips

What is the purpose of the pull request

Refactoring layout optimization (clustering) flow to

  • Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
  • Reconcile Layout Optimization and Clustering configuration to be more congruent

Brief change log

  • Refactored layout optimization flow to enable support for linear (lexicographic) ordering in column-stats indexes
  • Reconcile Layout Optimization and Clustering configuration to be more congruent
  • Refactored tests to validate full matrix of all optimization strategies, spatial curve composition strategies

Verify this pull request

This pull request is already covered by existing tests, such as (please describe tests).

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan
Copy link
Contributor

@alexeykudinkin : is there anyone you know will review this patch or you want me to review.

@nsivabalan
Copy link
Contributor

btw, looks like there are some CI failures. can you please check them.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. one clarification. We are not changing any config names right? I did verify, but just wanted to confirm. if yes, might have to add older config as alternative.

@alexeykudinkin
Copy link
Contributor Author

@nsivabalan correct, all configs are kept and marked as deprecated. The only thing that changes is that some of them have actually no effect anymore. How should we handle this?

For example LAYOUT_OPTIMIZATION_ENABLE is not used anymore, but that should not have an effect on users:

  1. Those that didn't use Clustering based on Spatial Curves, they will stay the same way (there are other configs required for that)
  2. Those that did use Clustering based on Spatial Curves, will also not be affected b/c it also required clustering to be enabled (which they should have to already had enabled)

* The more columns involved in sorting, the worse the aggregation, and the smaller the query performance improvement.
* Choose the filter columns which commonly used in query sql as sort columns.
* It is recommend that 2 ~ 4 columns participate in sorting.
* @deprecated this setting has no effect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add in documentation as to what other config(s) the user is supposed to look into instead of this deprecated one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. one nit on documentation about deprecating configs

Alexey Kudinkin and others added 2 commits January 21, 2022 12:35
Elaborated what configs users should refer to instead
@nsivabalan
Copy link
Contributor

@alexeykudinkin : I pushed a minor update to fix the build failure.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit bc7882c into apache:master Jan 24, 2022
alexeykudinkin pushed a commit to onehouseinc/hudi that referenced this pull request Jan 25, 2022
…low to support linear ordering (apache#4606)

Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
vingov pushed a commit to vingov/hudi that referenced this pull request Jan 26, 2022
…low to support linear ordering (apache#4606)

Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022
…low to support linear ordering (apache#4606)

Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
…low to support linear ordering (apache#4606)

Refactoring layout optimization (clustering) flow to
- Enable support for linear (lexicographic) ordering as one of the ordering strategies (along w/ Z-order, Hilbert)
- Reconcile Layout Optimization and Clustering configuration to be more congruent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants